Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Stochastic block partitioning (SBP) is a community detection algorithm that is highly accurate even on graphs with a complex community structure, but its inherently serial nature hinders its widespread adoption by the wider scientific community. To make it practical to analyze large real-world graphs with SBP, there is a growing need to parallelize and distribute the algorithm. The current state-of-the-art distributed SBP algorithm is a divide-and-conquer approach that limits communication between compute nodes until the end of inference. This leads to the breaking of computational dependencies, which causes convergence issues as the number of compute nodes increases and when the graph is sufficiently sparse. To address this shortcoming, we introduce EDiSt — an exact distributed stochastic block partitioning algorithm. Under EDiSt, compute nodes periodically share community assignments during inference. Due to this additional communication, EDiSt improves upon the divide-and-conquer algorithm by allowing it to scale out to a larger number of compute nodes without suffering from convergence issues, even on sparse graphs. We show that EDiSt provides speedups of up to 26.9x over the divide-and-conquer approach and speedups up to 44.0x over shared memory parallel SBP when scaled out to 64 compute nodes.more » « less
-
Community detection, or graph partitioning, is a fundamental problem in graph analytics with applications in a wide range of domains including bioinformatics, social media analysis, and anomaly detection. Stochastic block partitioning (SBP) is a community detection algorithm based on sequential Bayesian inference. SBP is highly accurate even on graphs with a complex community structure. However, it does not scale well to large real-world graphs that can contain upwards of a million vertices due to its sequential nature. Approximate methods that break computational dependencies improve the scalability of SBP via parallelization and data reduction. However, these relaxations can lead to low accuracy on graphs with complex community structure. In this paper, we introduce additional synchronization steps through vertex-level data batching to improve the accuracy of such methods. We then leverage batching to develop a high-performance parallel approach that improves the scalability of SBP while maintaining accuracy. Our approach is the first to integrate data reduction, shared-memory parallelization, and distributed computation, thus efficiently utilizing distributed computing resources to accelerate SBP. On a one-million vertex graph processed on 64 compute nodes with 128 cores each, our approach delivers a speedup of 322x over the sequential baseline and 6.8x over the distributed-only implementation. To the best of our knowledge, this Graph Challenge submission is the highest-performing SBP implementation to date and the first to process the one-million vertex graph using SBP.more » « less
-
Community detection in graphs, also known as graph partitioning, is a well-studied NP-hard problem. Various heuristic approaches have been adopted to tackle this problem in polynomial time. One such approach, as outlined in the IEEE HPEC Graph Challenge, is Bayesian statistics-based stochastic block partitioning. This method delivers high-quality partitions in sub-quadratic runtime, but it fails to scale to very large graphs. In this paper, we present sampling as an avenue for speeding up the algorithm on large graphs. We first show that existing sampling techniques can preserve a graph’s community structure. We then show that sampling for stochastic block partitioning can be used to produce a speedup of between 2.18× and 7.26× for graph sizes between 5, 000 and 50, 000 vertices without a significant loss in the accuracy of community detection.more » « less
An official website of the United States government
